AITopics

2605.11511

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Neural Information Processing SystemsMar-23-2026, 13:39:26 GMT

Selective inference for group-sparse linear models

Fan Yang, Rina Foygel Barber, Prateek Jain, John Lafferty

Neural Information Processing Systems http://nips.cc/

artificial intelligence, inference, machine learning, (16 more...)

Country:

North America > United States (0.29)
Europe > Austria (0.28)

Genre: Research Report (0.97)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Neural Information Processing SystemsFeb-11-2026, 23:22:49 GMT

cd706106802dbea2068efd7031c3b420-Paper-Conference.pdf

inference, segmentation result, selection event, (12 more...)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kim, Dongseok, Choi, Hyoungsun, Rasool, Mohamed Jismy Aashik, Oh, Gisung

$ϕ$-test: Global Feature Selection and Inference for Shapley Additive Explanations

arXiv.org Machine LearningDec-9-2025

We propose $ϕ$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an evaluation dataset, $ϕ$-test performs SHAP-guided screening and fits a linear surrogate on the screened features via a selection rule with a tractable selective-inference form. For each retained feature, it outputs a Shapley-based global score, a surrogate coefficient, and post-selection $p$-values and confidence intervals in a global feature-importance table. Experiments on real tabular regression tasks with tree-based and neural backbones suggest that $ϕ$-test can retain much of the predictive ability of the original model while using only a few features and producing feature sets that remain fairly stable across resamples and backbone classes. In these settings, $ϕ$-test acts as a practical global explanation layer linking Shapley-based importance summaries with classical statistical inference.

confidence interval, explanation, global feature selection and inference, (9 more...)

2512.07578

Genre: Research Report > Experimental Study (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Tasaka, Rieko, Kimura, Tatsuya, Suzuki, Joe

Spacing Test for Fused Lasso

arXiv.org Artificial IntelligenceNov-13-2025

Detecting changepoints in a one-dimensional signal is a classical yet fundamental problem. The fused lasso provides an elegant convex formulation that produces a stepwise estimate of the mean, but quantifying the uncertainty of the detected changepoints remains difficult. Post-selection inference (PSI) offers a principled way to compute valid $p$-values after a data-driven selection, but its application to the fused lasso has been considered computationally cumbersome, requiring the tracking of many ``hit'' and ``leave'' events along the regularization path. In this paper, we show that the one-dimensional fused lasso has a surprisingly simple geometry: each changepoint enters in a strictly one-sided fashion, and there are no leave events. This structure implies that the so-called \emph{conservative spacing test} of Tibshirani et al.\ (2016), previously regarded as an approximation, is in fact \emph{exact}. The truncation region in the selective law reduces to a single lower bound given by the next knot on the LARS path. As a result, the exact selective $p$-value takes a closed form identical to the simple spacing statistic used in the LARS/lasso setting, with no additional computation. This finding establishes one of the rare cases in which an exact PSI procedure for the generalized lasso admits a closed-form pivot. We further validate the result by simulations and real data, confirming both exact calibration and high power. Keywords: fused lasso; changepoint detection; post-selection inference; spacing test; monotone LASSO

artificial intelligence, fused lasso, inference, (13 more...)

arXiv.org Artificial Intelligence

2509.14229

Country: Asia > Japan (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.46)

Jason D. Lee, Yuekai Sun, Jonathan E. Taylor

Evaluating the statistical significance of biclusters

Neural Information Processing SystemsOct-2-2025, 05:32:23 GMT

Biclustering (also known as submatrix localization) is a problem of high practical relevance in exploratory analysis of high-dimensional data. We develop a framework for performing statistical inference on biclusters found by score-based algorithms. Since the bicluster was selected in a data dependent manner by a biclustering or localization algorithm, this is a form of selective inference . Our framework gives exact (non-asymptotic) confidence intervals and p-values for the significance of the selected biclusters.

algorithm, selection event, submatrix, (14 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
Information Technology > Artificial Intelligence > Machine Learning (0.70)

Liu, Sifan, Panigrahi, Snigdha

Flexible Selective Inference with Flow-based Transport Maps

arXiv.org Machine LearningJun-3-2025

Data-carving methods perform selective inference by conditioning the distribution of data on the observed selection event. However, existing data-carving approaches typically require an analytically tractable characterization of the selection event. This paper introduces a new method that leverages tools from flow-based generative modeling to approximate a potentially complex conditional distribution, even when the underlying selection event lacks an analytical description -- take, for example, the data-adaptive tuning of model parameters. The key idea is to learn a transport map that pushes forward a simple reference distribution to the conditional distribution given selection. This map is efficiently learned via a normalizing flow, without imposing any further restrictions on the nature of the selection event. Through extensive numerical experiments on both simulated and real data, we demonstrate that this method enables flexible selective inference by providing: (i) valid p-values and confidence sets for adaptively selected hypotheses and parameters, (ii) a closed-form expression for the conditional density function, enabling likelihood-based and quantile-based inference, and (iii) adjustments for intractable selection steps that can be easily integrated with existing methods designed to account for the tractable steps in a selection procedure involving multiple steps.

artificial intelligence, machine learning, modeling & simulation, (20 more...)

2506.0115

Country: North America > United States > Michigan (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Niihori, Mizuki, Katsuoka, Teruyuki, Shiraishi, Tomohiro, Nishino, Shuichi, Takeuchi, Ichiro

Statistically Significant $k$NNAD by Selective Inference

arXiv.org Machine LearningFeb-18-2025

In this paper, we investigate the problem of unsupervised anomaly detection using the k-Nearest Neighbor method. The k-Nearest Neighbor Anomaly Detection (kNNAD) is a simple yet effective approach for identifying anomalies across various domains and fields. A critical challenge in anomaly detection, including kNNAD, is appropriately quantifying the reliability of detected anomalies. To address this, we formulate kNNAD as a statistical hypothesis test and quantify the probability of false detection using $p$-values. The main technical challenge lies in performing both anomaly detection and statistical testing on the same data, which hinders correct $p$-value calculation within the conventional statistical testing framework. To resolve this issue, we introduce a statistical hypothesis testing framework called Selective Inference (SI) and propose a method named Statistically Significant NNAD (Stat-kNNAD). By leveraging SI, the Stat-kNNAD method ensures that detected anomalies are statistically significant with theoretical guarantees. The proposed Stat-kNNAD method is applicable to anomaly detection in both the original feature space and latent feature spaces derived from deep learning models. Through numerical experiments on synthetic data and applications to industrial product anomaly detection, we demonstrate the validity and effectiveness of the Stat-kNNAD method.

data mining, machine learning, selection event, (15 more...)

2502.12978

Genre:

Research Report > New Finding (0.47)
Research Report > Experimental Study (0.33)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)

Jason D. Lee, Jonathan E. Taylor

Exact Post Model Selection Inference for Marginal Screening

Neural Information Processing SystemsFeb-9-2025, 15:24:28 GMT

We develop a framework for post model selection inference, via marginal screening, in linear regression. At the core of this framework is a result that characterizes the exact distribution of linear functions of the response y, conditional on the model being selected ("condition on selection" framework). This allows us to construct valid confidence intervals and hypothesis tests for regression coefficients that account for the selection procedure. In contrast to recent work in high-dimensional statistics, our results are exact (non-asymptotic) and require no eigenvalue-like assumptions on the design matrix X. Furthermore, the computational cost of marginal regression, constructing confidence intervals and hypothesis testing is negligible compared to the cost of linear regression, thus making our methods particularly suitable for extremely large datasets. Although we focus on marginal screening to illustrate the applicability of the condition on selection framework, this framework is much more broadly applicable. We show how to apply the proposed framework to several other selection procedures including orthogonal matching pursuit and marginal screening+Lasso.

artificial intelligence, confidence interval, machine learning, (15 more...)

Country:

Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.05)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Virginia (0.04)

Genre: Research Report (0.88)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)